Quantitative Modeling of Segmental Duration

نویسنده

Jan P. H. van Santen

چکیده

In natural speech, durations of phonetic segments are strongly dependent on contextual factors. Quantitative descriptions of these contextual effects have appfications in text-to-speech synthesis and in automatic speech recognition. In this paper, we describe a speakerdependent system for predicting segmental duration from text, with emphasis on the statistical methods used for its construction. We also report results of a subjective listening experiment evaluating an implementation of this system for text-to-speech synthesis purposes. 1. I N T R O D U C T I O N This paper describes a system for prediction of segmental duration from text. In most text-to-speech synthesizer architectures, a duration prediction system is embedded in a sequence of modules, where it is preceded by modules that compute various linguistic features ~ from text. For example, the word "unit" might be represented as a sequence of five feature vectors: (< At/, word initial, monosyl labic , . . . , >) • " (< / t / ~ r s t , w o r d final, monosyllabic, . . . , >). In automatic speech recognition, a (hypothesized) phone is usually annotated only in terms of the preceding and following phones. If some form of lexical access is performed, more complete contextual feature vectors can be computed. Broadly speaking, construction of duration prediction systems has been approached in two ways. One is to use generalpurpose statistical methods such as CART 2 or neural nets. In CART, for example, a tree is constructed by making binary splits on factors that minimize the variance of the durations in the two subsets defined by the split [2]. These methods are called "general purpose" because they can be used across a variety of substantive domains. There also exists an older tradition exemplified by Klatt [3, 4, 5] and others [6, 7, 8, 9] where duration is computed with duration models, i.e., simple arithmetic models specifically designed for segmental duration. For example, in Klatt's lWe define a factor, FFi, to be a partition of mutually exclusive and exhaustive possibilities such as {1-stressed, 2-stressed, unstressed}. A feature is a "level" on a factor such as 1-stressed. The feature space F is the product space of all factors: Fl × -. × Fn. Because of phonotactic and other constraints, only a small fraction of this space can actually occur in a language; we call this the linguistic space. 2Classification and Regression Trees [1 ]. model the duration for feature vector f E F is given by

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Duration modeling for hindi text-to-speech synthesis system

This paper reports preliminary results of data-driven modeling of segmental (phoneme) duration for Hindi. Classification and Regression Tree (CART) based datadriven duration modeling for segmental duration prediction is presented. A number of features are considered and their usefulness and relative contribution for segmental duration prediction is assessed. Objective evaluation of the duration...

متن کامل

Duration modeling of Indian languages Hindi and Telugu

This paper reports a preliminary attempt on data-driven modeling of segmental (phoneme) duration for two Indian languages Hindi and Telugu. Classification and Regression Tree (CART) based data-driven duration modeling for segmental duration prediction is presented. A number of features are proposed and their usefulness and relative contribution in segmental duration prediction is assessed. Obje...

متن کامل

F0 contour and segmental duration modeling using prosodic features

This paper proposes a framework of F0 contour generation and segmental duration modeling for application in a unit-selection speech synthesis system for Polish – BOSS. We describe the design of the F0 and duration modeling modules and emphasize the role of prosodic features (related to stress, pitch accent and phrase) in these two tasks.

متن کامل

Semi-quantitative segmental perfusion scoring in myocardial perfusion SPECT: visual vs. automated analysis

Introduction: It is recommended that the physician apply at least a semi-quantitative segmental scoring system in myocardial perfusion SPECT. We aimed to assess the agreement between automated semi-quantitative analysis using QPS (quantitative Perfusion SPECT) software and visual approach for calculation of summed stress score (SSS), summed rest score (SRS) and summed difference score (SDS). ...

متن کامل

Segmental duration modeling in Turkish

Naturalness of synthetic speech highly depends on appropriate modeling of prosodic aspects. Mostly, three prosody components are modeled: segmental duration, pitch contour and intensity. In this study, we present our work on modeling segmental duration in Turkish using machinelearning algorithms, especially Classification and Regression Trees (CART). The models predict phone durations based on ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1993

Quantitative Modeling of Segmental Duration

نویسنده

چکیده

منابع مشابه

Duration modeling for hindi text-to-speech synthesis system

Duration modeling of Indian languages Hindi and Telugu

F0 contour and segmental duration modeling using prosodic features

Semi-quantitative segmental perfusion scoring in myocardial perfusion SPECT: visual vs. automated analysis

Segmental duration modeling in Turkish

عنوان ژورنال:

اشتراک گذاری